Preliminaries

First, we'll load some required libraries and set some global options



In [1]:

    
import pandas as pd          # our core data analysis toolkit
import numpy as np           
import pylab as pl           # plotting libraries
from ggplot import *
from pandas import read_json # a function for reading data in JSON files

# This option shows our plots directly in iPython Notebooks
%matplotlib inline 

# This option gives a more pleasing visual style to our plots
pd.set_option('display.mpl_style', 'default')

# The location of our playtest data file
filepath = "2014-05-13 makescape playtest.json"

Data Analysis

Loading and Cleaning

Below we'll load the data and sort by increasing timestamp.



In [2]:

    
def loadDataSortedByTimestamp(filepath):
    x = read_json(filepath)
    x = x.sort(columns='timestamp')
    x.index = range(0, len(x))
    return(x)

ms = loadDataSortedByTimestamp(filepath)

Now that our data is loaded with the variable ms (I chose it as an abbreviation of MakeScape), let's look at it and make sure it's sane. One of the first things I'll do is check the list of columns that our data comes with.



In [3]:

    
ms.columns









    Out[3]:





Index([u'_id', u'ada_base_types', u'adage_version', u'application_name', u'application_version', u'board_mode', u'component_list', u'created_at', u'deviceInfo', u'fish', u'fish_list', u'game', u'game_id', u'key', u'mode_name', u'num_batteries', u'num_leds', u'num_resistors', u'num_timers', u'player_name', u'player_names', u'playspace_id', u'playspace_ids', u'reason', u'resistance', u'session_token', u'timed_out', u'timestamp', u'updated_at', u'user_id', u'virtual_context', u'visability_mode', u'voltage'], dtype='object')

Whoa! That is alot of columns. 33 columns, to be exact. We can check that by calling Python's function for determining the length of a collection:



In [4]:

    
len(ms.columns)









    Out[4]:





33

But we should also check how many rows (in this case, how many distinct events) we have in our dataset.



In [5]:

    
len(ms) # returns 8505









    Out[5]:





8505



In [6]:

    
columns = ['key', 'timestamp']
ms.head(n=5)[columns]









    Out[6]:






  
    
      
      key
      timestamp
    
  
  
    
      0
       ADAGEStartSession
       1398178271860
    
    
      1
       ADAGEStartSession
       1398190767768
    
    
      2
       ADAGEStartSession
       1398191616469
    
    
      3
       ADAGEStartSession
       1398192512628
    
    
      4
       ADAGEStartSession
       1398192887546
    
  

5 rows × 2 columns

What's less-than-helpful right now is that those timestamps are just raw integers. We want to make sure those integers actually represent times when data could reasonably have been collected (and not, say, January of the year 47532, which actually happened once).

Thankfully, pandas comes with a function that can convert UNIX Epoch Time integers into human-recognizable dates. In this case, what we'll do is create a new column called human-readable-timestamps by applying the pandas Timestamp() function to our existing integers. Then we'll check the data.



In [7]:

    
ms['human-readable-timestamp'] = ms.timestamp.apply(lambda x: pd.Timestamp(x, unit='ms'))
columns = ['key', 'timestamp', 'human-readable-timestamp']
ms[columns].head()









    Out[7]:






  
    
      
      key
      timestamp
      human-readable-timestamp
    
  
  
    
      0
       ADAGEStartSession
       1398178271860
      2014-04-22 14:51:11.860000
    
    
      1
       ADAGEStartSession
       1398190767768
      2014-04-22 18:19:27.768000
    
    
      2
       ADAGEStartSession
       1398191616469
      2014-04-22 18:33:36.469000
    
    
      3
       ADAGEStartSession
       1398192512628
      2014-04-22 18:48:32.628000
    
    
      4
       ADAGEStartSession
       1398192887546
      2014-04-22 18:54:47.546000
    
  

5 rows × 3 columns

Visualizing Events Over Time

So, it seems like we have way too many MakeConnectComponent events. Earlier, I explained that we're averaging more than one MakeConnectComponent event per second. But what if we wanted to think about whether that average really describes a typical time slice of our data? In other words, we might want to know how our MakeConnectcomponent events are distributed over time.

One way to think about that distribution is to ask: when do our MakeConnectComponent events occur over time? Below, I'm going to use the ggplot package to look at the cumulative distribution of the connection events. The syntax may seem wonky and complicated at first, but it's actually an elegant implementation of ggplot2, itself an implementation of Leland Wilkinson's Grammar of Graphics. If at first you're stymied by it, don't worry. I'll try to help break it down for you.

First, we're doing some basic manipulation to get a cumulative sum column. This is actually so dumb I'm almost embarrased. I create a column by applying a lambda function to the timestamp column that always returns 1, then I just sum cumulatively over that column. Lastly, I apply another lambda function to format our timestamp (currently an integer) into a nicer-formatted timestamp.

I'll break these plotting lines down one major line at a time:

I create a ggplot object, which is essentially the basic kind of object from whence all plots are constructed in ggplot. aes() just stands for aesthetic mapping, where I'm telling ggplot how to map data to graphical features. In this case, I'm saying "map the values in the timestamp1 column to the x position of this plot, and map the values of cumulativeCount to the y position. It may seem trivial now, but the power of aesthetic mappings like this is that I can also map quantities (or categories) to other graphical properties, for example mapping other columns in my dataframe to the aesthetic properties of color or shape. For now, we'll just stick with mapping quantities to cartesian x and y coordinates. When creating the ggplot object, I also tell it what dataframe I'm talking about, so the local names of timestamp1 and cumulativeCount make sense in scope.
What we see now is a common pattern in ggplot-style programming. I can literally add a layer to my plot using the plus operator. Here, I'm telling ggplot that I want it to apply my chosen aesthetic mappings using a line geometry, which means it will connect each discrete datapoint with a line. (An alternative geometry would have been a simple point geometry, geom_point(), which would give us a bivariate scatterplot instead of a lineplot.
The remaining lines add special options to my plot using the same compositional syntax of the plus operator. Here, I'm using special functions ggtitle(), xlab(), and ylab() to set the text for the plot title and axis labels.
Next I just use a simple print() call to make sure my plot shows up in my interactive session.
Finally, I use a convenience function called ggsave() to save my plot directly to a file. ggsave() is smart and it detects the desired type of output file based on the suffix you pass in as a filename. In this case I'm using the .PNG format for images, but if I wanted an infinitely scalable image I could have used a .PDF extension.



In [8]:

    
# Manipulating the data to get a cumulative sum
# and nicely formatted timestamps
connectionEvents = ms[ms.key == 'MakeConnectComponent']
connectionEvents['cumulativeCount'] = connectionEvents.timestamp.apply(lambda x: 1).cumsum()
connectionEvents['timestamp1'] = connectionEvents.timestamp.apply(lambda x: pd.Timestamp(x, unit='ms')) 

# Creating the basic plot
p = ggplot(aes(x='timestamp1',
               y='cumulativeCount'),
           data=connectionEvents)
p = p + geom_line()
p = p + ggtitle('Cumulative Distribution of MakeConnectComponent Events')
p = p + xlab('Time')
p = p + ylab('Event Count')

# Showing the plot
print(p)

# Saving the plot
ggsave(plot=p,
       filename='cumulativeDistributionOfMakeConnectComponent1.png')









    












    



<ggplot: (295352553)>






    



Saving 11.0 x 8.0 in image.

This plot has a number of interesting features. Easily the most salient feature is the giant flat-line in the center. It looks like there were effectively no MakeConnectComponent events between between 2000 hrs on the first day and 1400 hrs on the second day. And that seems entirely reasonable: the game likely would have been shut off during the night, then fired up again for testing the next day.

The challenge is that on either side of the flatline the curves are quite steep. That means there could be a fair amount of information hiding in the areas of the plot where there was activity, but it's up to us to extract out that long, boring middle portion where nothing is happening. How can we do that?

Well, suppose what we wanted was to create two new plots, call them Day 1 and Day 2, that have just the interesting parts: the parts where the slope of the cumulative distribution is nonzero. (Note that when the slope is nonzero, that's when action is happening and we're registering events over time.) If we wanted two separate plots, all we have to is figure out when, exactly, that boring stretch of time starts and when, exactly, it ends. And, one way to do that would be if we could somehow generate the time that elapses between each successive event in our dataset. So let's do that.

Time Deltas

We're going to compute the time difference between successive events, which I'm calling time deltas. It's worth taking a second to think about how the delta is defined:

For the $i$th event, the $\Delta$ value is given by the simple equation below, where $t_i$ is the timestamp of the $i$th event:

$\Delta = t_i - t_{i-1}$

As a result, the very first event $(i = 0)$ in a series will have a diff value of NaN or NaT (Not a Time), because the -1st event is undefined. But the second event will have a diff value of (Time of Second event - Time of First event). The very last event will also have a value: (Time of Last event - Time of Penultimate event).

The handy thing about pandas is that every data series (and a column counts as data series) has a diff() method, which does exactly what we want: it computes successive pairwise differences between events. If we apply the .diff() function to our timestamps, we'll have exactly what we want: a column of numbers where each number represents the time elapsed since the event that came before.



In [9]:

    
connectionEvents['delta1'] = connectionEvents.timestamp1.diff()

And, now that we have our column of deltas, how can we figure out where the big boring part starts? Well, the big boring part lasts a long time: multiple hours. So, what we're looking for is a huge time delta. Say a delta of more than five hours.



In [10]:

    
connectionEvents[connectionEvents.delta1 > np.timedelta64(5, 'h')]['timestamp']









    Out[10]:





4443    1400077592798
Name: timestamp, dtype: int64



In [11]:

    
whenBoringPartEnds = 1400077592798 
day1 = connectionEvents[connectionEvents.timestamp < whenBoringPartEnds]
day2 = connectionEvents[connectionEvents.timestamp >= whenBoringPartEnds]

# Creating the day1 plot
p = ggplot(aes(x='timestamp1',
               y='cumulativeCount'),
           data=day1)
p = p + geom_line()
p = p + ggtitle('Cumulative Distribution of MakeConnectComponent Events\nDay1')
p = p + xlab('Time')
p = p + ylab('Event Count')

# Showing the plot
print(p)

# Saving the plot
ggsave(plot=p,
       filename='cumulativeDistributionDay1.png')

# Creating the day2 plot
p = ggplot(aes(x='timestamp1',
               y='cumulativeCount'),
           data=day2)
p = p + geom_line()
p = p + ggtitle('Cumulative Distribution of MakeConnectComponent Events\nDay2')
p = p + xlab('Time')
p = p + ylab('Event Count')

# Showing the plot
print(p)

# Saving the plot
ggsave(plot=p,
       filename='cumulativeDistributionDay2.png')









    












    



<ggplot: (283063801)>






    



Saving 11.0 x 8.0 in image.






    












    



<ggplot: (295742165)>






    



Saving 11.0 x 8.0 in image.

Higher Order Deltas

We can also compute second- and third-order deltas to explore the successive rates at which time deltas are changing:

$\Delta_2 = \Delta_{1_{n}} - \Delta_{1_{n-1}}$

$\Delta_3 = \Delta_{2_{n}} - \Delta_{2_{n-1}}$

Looking for zeroes

For each subsequent event where the timestamp is the same, we get a zero delta. So, if four successive events all share the same timestamp, they culminate in a fourth event whose delta3 is zero.

Let's do a little inspection and find events whose third and fourth order deltas are zero. We'll use the pandas method diff(), which takes an array of data and computes the difference between contiguous pairwise elements.

Our code below computes successive deltas (up to fourth-order), and assigns each array of deltas to its own column in the dataframe.



In [12]:

    
ms['delta1'] = ms.timestamp.diff()
ms['delta2'] = ms.delta1.diff()
ms['delta3'] = ms.delta2.diff()
ms['delta4'] = ms.delta3.diff()

# A boolean expression to select events where deltas 1–3 are all zero
thirdOrderZeroes = (ms.delta3 == 0) & (ms.delta2 == 0) & (ms.delta1 == 0) 

# The columns we'll want to view
columns = ['key', 'timestamp', 'delta1', 'delta2', 'delta3', 'delta4']

ms[thirdOrderZeroes][columns]









    Out[12]:






  
    
      
      key
      timestamp
      delta1
      delta2
      delta3
      delta4
    
  
  
    
      2482
       MakeDisconnectComponent
       1400006485499
       0
       0
       0
      -2102
    
    
      2506
       MakeDisconnectComponent
       1400006510313
       0
       0
       0
      -1886
    
    
      2526
       MakeDisconnectComponent
       1400006521978
       0
       0
       0
      -3534
    
    
      2572
       MakeDisconnectComponent
       1400006562692
       0
       0
       0
      -4151
    
    
      2623
       MakeDisconnectComponent
       1400006622320
       0
       0
       0
         -1
    
    
      2646
       MakeDisconnectComponent
       1400006634386
       0
       0
       0
       -685
    
    
      2647
       MakeDisconnectComponent
       1400006634386
       0
       0
       0
          0
    
    
      2672
       MakeDisconnectComponent
       1400006645003
       0
       0
       0
         -1
    
    
      2958
       MakeDisconnectComponent
       1400006866421
       0
       0
       0
      -2335
    
    
      3018
       MakeDisconnectComponent
       1400006896668
       0
       0
       0
      -2534
    
    
      3200
       MakeDisconnectComponent
       1400006993361
       0
       0
       0
      -1301
    
    
      3351
       MakeDisconnectComponent
       1400007068806
       0
       0
       0
       -850
    
    
      3480
       MakeDisconnectComponent
       1400007157316
       0
       0
       0
       -781
    
    
      3512
       MakeDisconnectComponent
       1400007196213
       0
       0
       0
      -1518
    
    
      3570
               MakeCaptureFish
       1400007234810
       0
       0
       0
        -51
    
    
      3667
       MakeDisconnectComponent
       1400007308155
       0
       0
       0
      -3251
    
    
      3785
                  MakeSnapshot
       1400007410096
       0
       0
       0
      -1783
    
    
      4199
       MakeDisconnectComponent
       1400007951843
       0
       0
       0
       -768
    
    
      7955
            MakeCircuitCreated
       1400083772689
       0
       0
       0
      -1300
    
    
      8022
       MakeDisconnectComponent
       1400083848640
       0
       0
       0
       -785
    
    
      8088
               MakeCaptureFish
       1400083893390
       0
       0
       0
       -152
    
    
      8123
               MakeCaptureFish
       1400083924105
       0
       0
       0
      -1085
    
  

22 rows × 6 columns

Inspecting chains of subsequent events

Event 2482 had a delta3 of zero. Let's look in its neighborhood (namely the 15 events that bracket it) and see what the timestamps of the preceding/subsequent events were.



In [13]:

    
ms[2470:2485][columns]









    Out[13]:






  
    
      
      key
      timestamp
      delta1
      delta2
      delta3
      delta4
    
  
  
    
      2470
          MakeConnectComponent
       1400006472299
          0
       -401
       -307
       -490
    
    
      2471
          MakeConnectComponent
       1400006472730
        431
        431
        832
       1139
    
    
      2472
            MakeCircuitCreated
       1400006472730
          0
       -431
       -862
      -1694
    
    
      2473
          MakeConnectComponent
       1400006472730
          0
          0
        431
       1293
    
    
      2474
                  MakeSnapshot
       1400006473382
        652
        652
        652
        221
    
    
      2475
                 MakeSpawnFish
       1400006473483
        101
       -551
      -1203
      -1855
    
    
      2476
                 MakeSpawnFish
       1400006473484
          1
       -100
        451
       1654
    
    
      2477
                  MakeSnapshot
       1400006478397
       4913
       4912
       5012
       4561
    
    
      2478
                  MakeSnapshot
       1400006483397
       5000
         87
      -4825
      -9837
    
    
      2479
               MakeCaptureFish
       1400006485499
       2102
      -2898
      -2985
       1840
    
    
      2480
       MakeDisconnectComponent
       1400006485499
          0
      -2102
        796
       3781
    
    
      2481
       MakeDisconnectComponent
       1400006485499
          0
          0
       2102
       1306
    
    
      2482
       MakeDisconnectComponent
       1400006485499
          0
          0
          0
      -2102
    
    
      2483
       MakeDisconnectComponent
       1400006485514
         15
         15
         15
         15
    
    
      2484
       MakeDisconnectComponent
       1400006485514
          0
        -15
        -30
        -45
    
  

15 rows × 6 columns

`FishCapture` events are followed by `DisconnectComponent` events that almost all share the same timestamp

It looks like event 2482 is part of a sequence: a fish was captured at 2479, and (because when a fish gets captured it blows out a circuit) a number of components got disconnected in events 2480-2484.

Note that not all the disconnect events share the same timestamp.

Viewing a histogram of time deltas

I mentioned in the last section that there's a huge problem with our data. Let's take a look again at the counts of different types of events, but this time we'll focus on the top five most frequent events:



In [14]:

    
topFiveMostFrequentEvents = ms.groupby('key').count().sort(columns=['timestamp'], ascending=False)[:5]
topFiveMostFrequentEvents['timestamp']









    Out[14]:





key
MakeConnectComponent       2609
MakeSnapshot               2086
MakeDisconnectComponent     943
MakeAddComponent            891
MakeRemoveComponent         840
Name: timestamp, dtype: int64

In our implementation of events, MakeConnectComponent should be triggered once the system detects that two circuit elements have become connected (say, when a player bumps a resistor and a battery together). MakeDisconnectComponent should be triggered once the system detects that two connected elements have become disconnected (say, when a player swipes a finger across a wire to cut it.)

That leads to Problem 1

Problem 1 - There are more connect events than there are disconnect events

And not just more, way more. Almost three times more. If you think about it, this doesn't make sense at all.

In our game, each block has a lone positive terminal and a lone negative terminal. And, each terminal only accepts a maximum of one connection for simplicity. When players add blocks to the table, the blocks don't start as being not connected to anything. The first MakeConnectComponent event should happen when two free terminals from two different blocks get bumped together. And, if two terminals are connected, they can't get connected to other things without getting disconnected first. So, we should expect at least something close to parity: there should be about as many disconnect events as there are connect events. Otherwise, how can a bunch of connected terminals keep connecting to other things? (Again, remember that each terminal should accept a maximum of one connection.)

So, that's pretty weird. And also possibly very bad. But, it gets even worse when you consider Problem 2.

Problem 2 - There are more connect events than there are snapshot events

In our game, we built an event that takes a snapshot of the entire state of the board at regular intervals. We did that because we knew not all actions in the game should generate events. For example, if we recorded every single time any block changed its position, that would be way too much data. On the other hand, we need to know when players move blocks even if those movements don't generate big game events (like completing a circuit). So, we compromised. Every second, the game stores a snapshot of information about the state of the board, and that event is called MakeSnapshot.

If you look at the table above (or the bar chart in our previous section), you'll notice that MakeSnapshot comes in second in our Top 5 Most Frequent Events. And, it's not a small margin, either. MakeConnectComponent is beating it by 25%. That is a Very Weird Thing.

If we assume that the snapshots are reliably firing every second, that means that On average, the system is registering block-to-block connections more than once per second. If all that data were user-generated, that means that over a period of about 44 total minutes of gameplay, kids were connecting blocks at a rate of more than one block per second, every second for the entirety of the 44 minutes.

To put that in perspective one more way:

if you played The Lion Sleeps Tonight by The Tokens
on repeat,
20 consecutive times,
and you snapped your fingers to the beat (60bpm) the entire time
each snap would represent two circuit elements being connected

Rather than assuming children are connecting circuit elements at a ludicrously high rate (Problem 2), which also would seem physically impossible (Problem 1), it seems more likely that there's a problem with the game's data logging that's revealed by our data. So, let's go to the next section and explore a method for checking it out.



In [15]:

    
topFive = loadDataSortedByTimestamp(filepath)
topFiveMostFrequentEvents = list(ms.groupby('key').count().sort(columns=['timestamp']).index)[-5:]
frequencyFilter = topFive.key.apply(lambda x: x in topFiveMostFrequentEvents)
topFive = topFive[frequencyFilter]
topFive['delta1'] = topFive.timestamp.diff()
binBreaks = [-1, 1, 50, 100, 200, 300, 500, 1000]
# binBreaks = [1000, 2000, 3000, 4000, 5000]

p = ggplot(aes(x='delta1',
               fill='key'),
           data=topFive) + \
        geom_histogram(breaks=binBreaks) + \
        scale_x_continuous(breaks=binBreaks) + \
        ggtitle('Distribution of Time Deltas Between Successive Events') + \
        ylab('Number of Events') + \
        xlab('Time Between Events (ms)')
# print(p)
# ggsave(p, "histogram.png")
# topFive.head(n = 20)[['timestamp', 'delta1', 'key']]
print(p)









    












    



<ggplot: (302503801)>

So, we know the distribution of MakeConnectComponent events is uneven. We know that because in the last section we plotted the cumulative distribution function and saw it had a widely variable slope. What I'd like to do now is get an idea of just what that distribution of elapsed event times looks like.

To do that, we're going to use a visualization called a kernel density estimate. Essentially, what we're doing is creating a smoothed empirical approximation of what the distributions of events look like.

We're also going to use another powerful feature of graphical analysis: what statistician [Bill Cleveland][http://cm.bell-labs.com/cm/ms/departments/sia/wsc/] and Edward Tufte call "small multiples." We're actually going to take a look at the distributions of time deltas for the top 5 most frequent kinds of events and graphically compare them.



In [16]:

    
topFive = loadDataSortedByTimestamp(filepath)
topFiveMostFrequentEvents = list(ms.groupby('key').count().sort(columns=['timestamp']).index)[-5:]
frequencyFilter = topFive.key.apply(lambda x: x in topFiveMostFrequentEvents)
topFive['delta1'] = topFive.timestamp.diff()
topFive = topFive[frequencyFilter]


p = ggplot(aes(x = 'delta1', 
               group='key'), 
           data=topFive)
p = p + geom_density() # a Kernel Density Estimate
p = p + scale_x_continuous(limits=[-1000, 20000])
p = p + facet_wrap(y='key', 
                   ncol=1, 
                   scales='fixed')
p = p + xlab('Time Between Successive Events (ms)')
p = p + ggtitle('Smoothed Kernel Density Estimates')

print(p)
ggsave(plot=p,
       filename='kernelDensityEstimate.png')









    












    



<ggplot: (302503917)>






    



Saving 11.0 x 8.0 in image.



In [16]:



In [17]:

    
connections = loadDataSortedByTimestamp(filepath)
connections = connections[connections.key == 'MakeConnectComponent']
connections['delta1'] = connections.timestamp.diff()

binBreaks = [0, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]
p = ggplot(aes(x='delta1',
               fill='key'),
           data=connections) + \
        geom_histogram(breaks=binBreaks) + \
        scale_x_continuous(breaks=binBreaks) + \
        ggtitle('Distribution of Time Deltas Between Successive Events') + \
        ylab('Number of Events') + \
        xlab('Time Between Events (ms)')
print(p)
ggsave(plot=p, 
       filename='histogram2.png')









    












    



<ggplot: (301594873)>






    



Saving 11.0 x 8.0 in image.



In [18]:

    
topFive[topFive.key == 'MakeDisconnectComponent'][['key', 'delta1']].head()
print(topFive[topFive.key == 'MakeDisconnectComponent']['delta1'].describe())
topFive[topFive.key == 'MakeDisconnectComponent']['delta1'].plot(kind='kde')

# print(p)
# ggsave(p, "histogram.png")
# topFive.head(n = 20)[['timestamp', 'delta1', 'key']]









    



count     943.000000
mean      516.357370
std       981.573755
min         0.000000
25%         0.000000
50%        16.000000
75%       566.500000
max      9866.000000
Name: delta1, dtype: float64






    Out[18]:





<matplotlib.axes.AxesSubplot at 0x11a37f050>



In [19]:

    
connects = loadDataSortedByTimestamp(filepath)
connects = connects[connects.key == 'MakeConnectComponent']
connects['delta1'] = connects.timestamp.diff()
p = ggplot(aes(x='delta1',
               fill='key'),
           data=connects) + \
        geom_histogram(breaks=binBreaks) + \
        scale_x_continuous(breaks=binBreaks) + \
        ggtitle('Distribution of Time Deltas Between Successive Events') + \
        ylab('Number of Events') + \
        xlab('Time Between Events (ms)')
        
print(p)









    












    



<ggplot: (299732397)>



In [20]:

    
len(connections[connections.delta1 <= 1000]) # 1744 events









    Out[20]:





1744



In [21]:

    
columns = ['timestamp', 'key']
ms.groupby('key').count().sort(columns=['timestamp'], ascending=False)[columns]









    Out[21]:






  
    
      
      timestamp
      key
    
    
      key
      
      
    
  
  
    
      MakeConnectComponent
       2609
       2609
    
    
      MakeSnapshot
       2086
       2086
    
    
      MakeDisconnectComponent
        943
        943
    
    
      MakeAddComponent
        891
        891
    
    
      MakeRemoveComponent
        840
        840
    
    
      MakeCircuitCreated
        229
        229
    
    
      MakeResetBoard
        211
        211
    
    
      MakeSpawnFish
        161
        161
    
    
      MakeCaptureFish
        136
        136
    
    
      MakeEndGame
        100
        100
    
    
      MakeSummonBoard
         96
         96
    
    
      MakeStartGame
         92
         92
    
    
      MakeModeChange
         45
         45
    
    
      MakeVisabilityChange
         43
         43
    
    
      ADAGEStartSession
         23
         23
    
  

15 rows × 2 columns



In [22]:

    
# We can also see what this looks like as a plot
msdata = ms.groupby('key').count().sort(columns=['timestamp', 'key'], ascending=False)
p = msdata['timestamp'].plot(kind='bar')
print(p)
pl.savefig("barChart.jpg", 
           dpi=300, 
           figsize=(8, 11),
           bbox_inches='tight')









    



Axes(0.125,0.125;0.775x0.775)

How many components are in the component lists of disconnect events

From our meeting on 2014-05-30 Allison confirmed that MakeConnectComponet always generates a component_list of the two blocks being connected.

Matthew also said that every time a connection is established between blocks A and B, it generates two MakeConnectComponent events:

A connecting to B and
B connecting to A

So, I tried investigating whether all MakeDisconnectComponent events generate a component_list with just two components.

But, my first query below failed, because it seems some events have null values for component_list:

(ms[ms.key == 'MakeDisconnectComponent']['component_list']).apply(lambda x: len(x))

So, let's try another tactic. First, we'll create a filter that returns null values for component_lists.



In [23]:

    
nullComponentLists = pd.isnull(ms['component_list'])
ms[nullComponentLists][ms.key == 'MakeDisconnectComponent'][['key', 'component_list']]









    



/Users/briandanielak/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/frame.py:1686: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  "DataFrame index.", UserWarning)






    Out[23]:






  
    
      
      key
      component_list
    
  
  
    
      980 
       MakeDisconnectComponent
       None
    
    
      1154
       MakeDisconnectComponent
       None
    
    
      1252
       MakeDisconnectComponent
       None
    
    
      1274
       MakeDisconnectComponent
       None
    
    
      2958
       MakeDisconnectComponent
       None
    
    
      3018
       MakeDisconnectComponent
       None
    
    
      3200
       MakeDisconnectComponent
       None
    
    
      3262
       MakeDisconnectComponent
       None
    
    
      3304
       MakeDisconnectComponent
       None
    
    
      4473
       MakeDisconnectComponent
       None
    
    
      7228
       MakeDisconnectComponent
       None
    
  

11 rows × 2 columns

Event 980 is the first disconnect event where there is no component_list, so let's check out what's happening in the neighborhood of that event



In [24]:

    
ms[970:986][['key', 'timestamp']]









    Out[24]:






  
    
      
      key
      timestamp
    
  
  
    
      970
          MakeConnectComponent
       1400003097166
    
    
      971
       MakeDisconnectComponent
       1400003098417
    
    
      972
               MakeCaptureFish
       1400003098417
    
    
      973
       MakeDisconnectComponent
       1400003098433
    
    
      974
          MakeConnectComponent
       1400003098966
    
    
      975
            MakeCircuitCreated
       1400003098966
    
    
      976
          MakeConnectComponent
       1400003098966
    
    
      977
          MakeConnectComponent
       1400003099232
    
    
      978
                 MakeSpawnFish
       1400003099634
    
    
      979
                  MakeSnapshot
       1400003100748
    
    
      980
       MakeDisconnectComponent
       1400003101299
    
    
      981
               MakeCaptureFish
       1400003101299
    
    
      982
       MakeDisconnectComponent
       1400003101299
    
    
      983
       MakeDisconnectComponent
       1400003101314
    
    
      984
                  MakeSnapshot
       1400003105748
    
    
      985
          MakeConnectComponent
       1400003107948
    
  

16 rows × 2 columns

So, it looks like events 980, 981, and 982 all share the exact same timestamp (down to the millisecond), with one final disconnection event happening only about 15ms later.



In [25]:

    
ms[980:984][['key', 'timestamp', 'delta1']]









    Out[25]:






  
    
      
      key
      timestamp
      delta1
    
  
  
    
      980
       MakeDisconnectComponent
       1400003101299
       551
    
    
      981
               MakeCaptureFish
       1400003101299
         0
    
    
      982
       MakeDisconnectComponent
       1400003101299
         0
    
    
      983
       MakeDisconnectComponent
       1400003101314
        15
    
  

4 rows × 3 columns

We can try to interrogate what's happening by looking at the component lists. First, let's look inside the circuit that was created at event 975. I'm calling list() on it because I kind of don't understand how else to get pandas to get me the output format I need to inspect :-)



In [26]:

    
list(ms[975:978].component_list)









    Out[26]:





[[{u'current_flowing': True,
   u'marker_id': 2,
   u'negative_terminal': 19,
   u'positive_terminal': 27,
   u'theta': 30.2151107788086,
   u'type': u'BATTERY',
   u'x': 376.921844482422,
   u'y': 292.583557128906},
  {u'current_flowing': True,
   u'marker_id': 27,
   u'negative_terminal': 2,
   u'positive_terminal': 19,
   u'theta': 14.361967086792,
   u'type': u'LED',
   u'x': 505.863616943359,
   u'y': 337.244506835938},
  {u'current_flowing': True,
   u'marker_id': 19,
   u'negative_terminal': 27,
   u'positive_terminal': 2,
   u'theta': 262.492034912109,
   u'type': u'RESISTOR',
   u'x': 712.927917480469,
   u'y': 163.924194335938}],
 [{u'current_flowing': True,
   u'marker_id': 2,
   u'negative_terminal': 19,
   u'positive_terminal': 27,
   u'theta': 30.2151107788086,
   u'type': u'BATTERY',
   u'x': 376.921844482422,
   u'y': 292.583557128906},
  {u'current_flowing': True,
   u'marker_id': 27,
   u'negative_terminal': 2,
   u'positive_terminal': 19,
   u'theta': 14.361967086792,
   u'type': u'LED',
   u'x': 505.863616943359,
   u'y': 337.244506835938}],
 [{u'current_flowing': True,
   u'marker_id': 27,
   u'negative_terminal': 2,
   u'positive_terminal': 19,
   u'theta': 14.1408853530884,
   u'type': u'LED',
   u'x': 509.764862060547,
   u'y': 342.452362060547},
  {u'current_flowing': True,
   u'marker_id': 2,
   u'negative_terminal': 19,
   u'positive_terminal': 27,
   u'theta': 16.4315128326416,
   u'type': u'BATTERY',
   u'x': 409.187103271484,
   u'y': 336.581115722656}]]

So, the circuit is a 3-block circuit. It has:

A battery (block id 2)
A resistor (block id 19)
An LED (block id 27)

After a fish hits that circuit, it fries all the virtual wires (because the fish are biolectric), so we should expect to see 3 disconnect events:

Disconnecting the battery and resistor (2-19)
Disconnecting the battery and the LED (2-27)
Disconnecting the battery and the LED (19-27)



In [27]:

    
list(ms[980:984].component_list)









    Out[27]:





[None,
 [{u'current_flowing': True,
   u'marker_id': 28,
   u'negative_terminal': 31,
   u'positive_terminal': 6,
   u'theta': 222.557052612305,
   u'type': u'LED',
   u'x': 284.760711669922,
   u'y': -138.952453613281}],
 [{u'current_flowing': True,
   u'marker_id': 6,
   u'negative_terminal': -1,
   u'positive_terminal': 31,
   u'theta': 277.101806640625,
   u'type': u'BATTERY',
   u'x': 271.590301513672,
   u'y': -393.121185302734}],
 [{u'current_flowing': True,
   u'marker_id': 6,
   u'negative_terminal': -1,
   u'positive_terminal': 31,
   u'theta': 277.101806640625,
   u'type': u'BATTERY',
   u'x': 271.590301513672,
   u'y': -393.123443603516}]]



In [28]:

    
ms[980:984].timestamp









    Out[28]:





980    1400003101299
981    1400003101299
982    1400003101299
983    1400003101314
Name: timestamp, dtype: int64



In [28]:

	key	timestamp
0	ADAGEStartSession	1398178271860
1	ADAGEStartSession	1398190767768
2	ADAGEStartSession	1398191616469
3	ADAGEStartSession	1398192512628
4	ADAGEStartSession	1398192887546

	key	timestamp	human-readable-timestamp
0	ADAGEStartSession	1398178271860	2014-04-22 14:51:11.860000
1	ADAGEStartSession	1398190767768	2014-04-22 18:19:27.768000
2	ADAGEStartSession	1398191616469	2014-04-22 18:33:36.469000
3	ADAGEStartSession	1398192512628	2014-04-22 18:48:32.628000
4	ADAGEStartSession	1398192887546	2014-04-22 18:54:47.546000

	key	timestamp	delta4
2482	MakeDisconnectComponent	1400006485499	-2102
2506	MakeDisconnectComponent	1400006510313	-1886
2526	MakeDisconnectComponent	1400006521978	-3534
2572	MakeDisconnectComponent	1400006562692	-4151
2623	MakeDisconnectComponent	1400006622320	-1
2646	MakeDisconnectComponent	1400006634386	-685
2647	MakeDisconnectComponent	1400006634386	0
2672	MakeDisconnectComponent	1400006645003	-1
2958	MakeDisconnectComponent	1400006866421	-2335
3018	MakeDisconnectComponent	1400006896668	-2534
3200	MakeDisconnectComponent	1400006993361	-1301
3351	MakeDisconnectComponent	1400007068806	-850
3480	MakeDisconnectComponent	1400007157316	-781
3512	MakeDisconnectComponent	1400007196213	-1518
3570	MakeCaptureFish	1400007234810	-51
3667	MakeDisconnectComponent	1400007308155	-3251
3785	MakeSnapshot	1400007410096	-1783
4199	MakeDisconnectComponent	1400007951843	-768
7955	MakeCircuitCreated	1400083772689	-1300
8022	MakeDisconnectComponent	1400083848640	-785
8088	MakeCaptureFish	1400083893390	-152
8123	MakeCaptureFish	1400083924105	-1085

	key	timestamp	delta1	delta2	delta3	delta4
2470	MakeConnectComponent	1400006472299	0	-401	-307	-490
2471	MakeConnectComponent	1400006472730	431	431	832	1139
2472	MakeCircuitCreated	1400006472730	0	-431	-862	-1694
2473	MakeConnectComponent	1400006472730	0	0	431	1293
2474	MakeSnapshot	1400006473382	652	652	652	221
2475	MakeSpawnFish	1400006473483	101	-551	-1203	-1855
2476	MakeSpawnFish	1400006473484	1	-100	451	1654
2477	MakeSnapshot	1400006478397	4913	4912	5012	4561
2478	MakeSnapshot	1400006483397	5000	87	-4825	-9837
2479	MakeCaptureFish	1400006485499	2102	-2898	-2985	1840
2480	MakeDisconnectComponent	1400006485499	0	-2102	796	3781
2481	MakeDisconnectComponent	1400006485499	0	0	2102	1306
2482	MakeDisconnectComponent	1400006485499	0	0	0	-2102
2483	MakeDisconnectComponent	1400006485514	15	15	15	15
2484	MakeDisconnectComponent	1400006485514	0	-15	-30	-45

	timestamp	key
key
MakeConnectComponent	2609	2609
MakeSnapshot	2086	2086
MakeDisconnectComponent	943	943
MakeAddComponent	891	891
MakeRemoveComponent	840	840
MakeCircuitCreated	229	229
MakeResetBoard	211	211
MakeSpawnFish	161	161
MakeCaptureFish	136	136
MakeEndGame	100	100
MakeSummonBoard	96	96
MakeStartGame	92	92
MakeModeChange	45	45
MakeVisabilityChange	43	43
ADAGEStartSession	23	23

	key	component_list
980	MakeDisconnectComponent	None
1154	MakeDisconnectComponent	None
1252	MakeDisconnectComponent	None
1274	MakeDisconnectComponent	None
2958	MakeDisconnectComponent	None
3018	MakeDisconnectComponent	None
3200	MakeDisconnectComponent	None
3262	MakeDisconnectComponent	None
3304	MakeDisconnectComponent	None
4473	MakeDisconnectComponent	None
7228	MakeDisconnectComponent	None

	key	timestamp
970	MakeConnectComponent	1400003097166
971	MakeDisconnectComponent	1400003098417
972	MakeCaptureFish	1400003098417
973	MakeDisconnectComponent	1400003098433
974	MakeConnectComponent	1400003098966
975	MakeCircuitCreated	1400003098966
976	MakeConnectComponent	1400003098966
977	MakeConnectComponent	1400003099232
978	MakeSpawnFish	1400003099634
979	MakeSnapshot	1400003100748
980	MakeDisconnectComponent	1400003101299
981	MakeCaptureFish	1400003101299
982	MakeDisconnectComponent	1400003101299
983	MakeDisconnectComponent	1400003101314
984	MakeSnapshot	1400003105748
985	MakeConnectComponent	1400003107948